SiteCrafting, Inc.
Adventures in Database Migration Pt. I
SiteCrafting is in the process of phasing out some of our older servers, and as an added bonus, the clients hosted on those servers are getting a MySQL jumpstart, leapfrogging over 4.1 to go straight from 4.0.24 to 5.0.32. Tragically, it's not quite as simple as dump | import. This is what I get for bothering my bosses for a few weeks not long after coming aboard about how nice stored procedures, updatable views, and triggers could be.
"The wonderful thing about standards," a wiser person than me once said, "is that there are so many of them." That's not the whole of it, though. One good thing about standards is that there are certain features one can generally rely upon to work, translate, port, etc. Assuming one works within them, rather than taking advantage of loopholes allowed by their not-entirely-compliant-but-we're-getting-there-and-anyway-isn't-this-way- easier-and-faster software. When people don't (and I'm not entirely innocent here), you run the risk of turning your simple upgrade into a serious project when your favorite software decides it's time to comply a bit more.
The list of incompatible changes between these versions is actually pleasantly short. Many are in realms that don't affect the migration. Others, such as the clipping of oversized numbers (nowadays when you specify that a NUMERIC or similar column is to have no more than two digits left of the decimal, MySQL will actually enforce this and clip 100 down to the nearest 99), can be reviewed with a quick glance at column definitions. However, one has stuck us with a bit more trouble:
The precedence of the comma operator also now is lower compared to JOIN, LEFT JOIN, and so forth.
Pretty innocuous, eh? Not so much. One of the key aspects of the ON clause in a JOIN statement is that it must compare to a column that is already a part of the output table. This is perfectly reasonable. Consider: SELECT * FROM a INNER JOIN b ON b.key = c.key. Where did c come from? Before you JOIN to b, a constitutes the entirety of the output table. The column c.key is nowhere to be found, and as such has no place in the ON clause.
This is all well and good; that query is patently ridiculous anyway. You JOIN tables based on the columns in the tables you've referenced already. Order is of minimal importance: SELECT * FROM a INNER JOIN b ON b.key = a.key INNER JOIN c ON c.key = b.key returns the same information as SELECT * FROM a INNER JOIN c ON c.key = b.key INNER JOIN b ON b.key = a.key. Column order is mixed up, but as a skilled query author, you're no doubt avoiding * and only specifying the columns you need. And as a writer of readable code I bet you're using an associative array rather than numerical indices to refer to your columns anyway (if not reading from the result directly into stand-alone variables).
This is all grand as long as all your operators retain the same precedence. And thus we come to the dreaded comma join. I've been avoiding commas in my FROM clauses for as long as I've understood how to use INNER and LEFT JOIN effectively. Standards aside, I think it makes a query more readable. Take these two examples:
SELECT *
FROM a, b, c, d
WHERE a.key = c.key1
AND c.key2 = b.key
AND d.key = a.keyvs.
SELECT *
FROM a
INNER JOIN c ON c.key1 = a.key
INNER JOIN b ON b.key = c.key2
INNER JOIN d ON d.key = a.key
Both convey the same data, and in either it's easy enough to which tables are being referenced: either all in a row or down a column (assuming you put line breaks in your queries). In the second, however, it is much easier to see just how a given table is included: the joining criteria is right there on the same line. Using commas, you're stuck digging through a WHERE clause to figure out just how that piece of info got in there (it may not seem so bad looking at these queries, but as they get bigger, and as query design gets sloppier, things just get unpleasant).
However, now the problems are much bigger. With the comma precedence lowering, nothing linked in via this method is recognized as even existing while the parser sorts out the JOIN statements. So you'd better not be using commas and LEFT JOINs in the same WHERE clause, or you're in for an adventure.
Where does this leave me? Searching through the code of two dozen clients for the word "JOIN", and madly wiping out the comma scourge with a massive influx of INNER JOIN. Exciting stuff, really. Personally, I'd recommend avoiding this altogether. Stick with the JOINs, people. The JOINs.
by Joe Izenman | 9/6/2007 4:45pm | Comments (13)
But I don't like the JOINs.
Left by Bernie Zimmermann | Sep 6, 2007
MySQL 4.1 introduced a number of major changes and is therefore a major upgrade in many ways.
It is recommended you first upgrade to 4.1 then upgrade to 5.0
Left by Ronald Bradford | Sep 6, 2007
We are going through this same thing at work. I really haven't blogged about but it sounds like we are experiencing the same problems. You need to give your developers sufficient time in a pre-production environment so that they can uncover all these little bug-a-boo's so you don't get blamed for problems!!
Left by Keith Murphy | Sep 6, 2007
I'd say that stored procs, triggers and views are not very mature in MySQL 5, so, if you plan to use them extensively consider the related risks (see www.mysqlperformanceblog.com for some examples).
I've also blogged about my experience migrating a database from Firebird to MySQL which shows some of MySQL's stored procs shortcomings.
Hope this helps
Left by pabloj | Sep 7, 2007
Okay, one at a time:
Bernie: well, that sounds like a personal problem :)
Ronald: were we actually performing a direct upgrade to the server's database, I probably would have recommended a step by step upgrade, as is suggested in the documentation. As this was just a migration to a new server, it felt... excessive. I don't think we've run into any troubles that we wouldn't have hit going step by step. Fortunately the environment is sufficiently small that rebuilding the grant tables by hand was a minor task, and there was very little going on that ran headfirst into major changes.
Keith: I have, in pre-production, set up development servers for 4.1 and 5.0, in addition to our old 4.0. So the first step that I have taken was to jump the local copies over, and test them on there before moving the production data over. Also, of course, now any new projects that go on our new servers have a proper development environment for everyone to play in.
Aaaaand pabloj: I know that not everything is in top form. I am a regular reader of mysqlperformanceblog, and have dug around in the documentation. The conclusion I've come to is this: make sure it really is the right move before you do it. But if it turns out it is the right move, well... it helps to at least have the capability. Given the choice I'd rather be on 5.0 and decide not to use a trigger than run into a situation on 4.0 where I need a trigger and I don't have the capacity.
Thanks, everyone, for all your input! I'm still pretty new at database administration in general, so every little bit helps.
Left by Joe Izenman | Sep 7, 2007
And I tuhoght I was the sensible one. Thanks for setting me straight.
Left by Demelza | Jan 17
ogmeDu It`s really useful! Looking through the Internet you can mostly observe watered down information, something like bla bla bla, but not here to my deep surprise. It makes me happy..!
Left by college basketball | Feb 26
9lC7Vs Im obliged for the article post.Much thanks again. Really Great.
Left by Discount OEM Software | Mar 7
I2fR1v Thanks-a-mundo for the article.Really looking forward to read more. Keep writing.
Left by Microsoft OEM Software | Mar 7
I really liked your article post.Really thank you! Will read on...
Left by LifeCell Reviews | Apr 4
I really liked your article.Much thanks again. Really Great.
Left by LifeCell Reviews | Apr 4
dDeAy1 I cannot thank you enough for the blog.Thanks Again. Really Cool.
Left by Request for Proposal | Apr 5
http://ussafepharm.com/ how to buy cialis generic drugs
Left by Buy Cialis | Apr 16